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We propose improved versions of the standard diffusion Monte Carlo (DMC) and the lattice regularized 
diffusion Monte Carlo (LRDMC) algorithms. For the DMC method, we refine a scheme recently devised to 
treat non-local pseudopotential in a variational way. We show that such scheme -when applied to large enough 
systems- maintains its effectivness only at correspondingly small enough time-steps, and we present two simple 
upgrades of the method which guarantee the variational property in a size-consistent manner. For the LRDMC 
method, which is size-consistent and variational by construction, we enhance the computational efficiency by 
introducing (i) an improved definition of the effective lattice Hamiltonian which remains size-consistent and 
entails a small lattice-space error with a known leading term, and (ii) a new randomization method for the 
positions of the lattice knots which requires a single lattice-space. 



I. INTRODUCTION 

The fixed-node diffusion Monte Carlo (DMC) method is of- 
ten the method of choice for accurate computations of many- 
body systems 1 . Since the scaling of DMC with the number of 
electrons N is a modest iV 4 , the method has been employed 
in recent years to accurately compute electronic properties of 
large molecular and solid systems where conventional highly- 
correlated quantum chemistry approaches are very difficult to 
apply. Unfortunately, for full-core atoms the computational 
cost of DMC increases approximatelyEE as Z 5 - 5 ~ 6 - 5 with the 
atomic number Z. Therefore, the use of pseudopotentials is 
an essential ingredient in the application of DMC to complex 
systems to reduce the effective value of Z and significantly 
improve the efficiency of the method. 

The use of pseudopotentials in DMC poses however a prob- 
lem since pseudopotentials are usually non-local and the non- 
locality introduces a fermionic sign problem additional to the 
one due to the anti-symmetry of the electronic wave function. 
The commonly adopted solution is to "localize" the non-local 
pseudopotential on the trial wave function and use this local 
potential in the DMC simulation 4 ^. Unfortunately, the so- 
called locality approximation (LA) does not ensure variation- 
ality and alternative schemes employing a different effective 
Hamiltonian were recently introduced to overcome this diffi- 
culty^] 

In the lattice regularized diffusion Monte Carlo (LRDMC) 
algorithm 6 , both the Laplacian and the non-local pseudopo- 
tentials are discretized such that the corresponding imaginary 
time propagator (exp[— tH]) assumes non-zero values only 
on a finite set of points, and the lattice Green function Monte 
Carlo algorithm can be employed, that ensures variationality 
and stability all along the simulation 9 . Alternatively, another 
scheme which is based on the standard DMC algorithm was 
developed 7 . The latter exploits the discretization of the prop- 
agator only in the part depending on the non-local pseudopo- 
tentials, and a non-local effective Hamiltonian is defined in 
order to fulfill the fixed node constraint. Here, we show that 



this variational DMC scheme is however not size consistent at 
finite time-steps. Indeed, the time-step error strongly depends 
on the system size and, upon increasing the number of par- 
ticles at fixed time-step, the corresponding energies approach 
those given by DMC with LA. In this paper, we explain how to 
cure this problem and present a simple formulation of the al- 
gorithm which is size-consistent and suffers at the same time 
from a smaller time-step error. Moreover, we define a bet- 
ter discretization rule for the LRDMC effective Hamiltonian, 
which reduces the lattice-space bias, remains size consistent 
as in the original formulation, and improves the efficiency of 
the method. 

In Section [II] we briefly summarize the problems intro- 
duced by the use of non-local pseudopotentials in the standard 
DMC and describe in detail the variational DMC algorithm of 
Ref. [7] to treat pseudopotentials beyond the commonly used 



LA. In Section III we present our size-consistent variational 
approach to non-local pseudopotentials in DMC and demon- 
strate its effectiveness on a series of oxygen systems of in- 



creasing size. In Section IV we briefly describe the LRDMC 



method, which is variational by its own nature, and give a 
better prescription for the lattice regularization of the continu- 
ous Hamiltonian to always guarantee a well defined and faster 
zero lattice-space extrapolation. Finally, in Section[V] we dis- 
cuss the behavior of the discretization error of the different 
DMC algorithms (in time or space as appropriate) and com- 
ment on the relative efficiency of the methods presented here. 



II. DMC AND NON-LOCAL PSEUDOPOTENTIALS 

In DMC, the projection to the ground state wave function of 
an Hamiltonian H is performed by stochastically applying the 
operator cxp[— tH] to a trial wave function ^t- If the pro- 
jection is formulated in real space and importance sampling 
introduced, the mixed distribution /(R, t) = \fr T (R)\I/(R, t) 



2 



is then propagated as 

/(R',t + T)= y dRG(R',R, t)/(R, t) , (1) 
where the importance sampling Green's function is defined as 

G(R',R,r) = |^(R'|exp[-r^]|R). (2) 

The fixed-node (FN) approximation is usually employed for 
fermionic systems to avoid the collapse to the bosonic ground 
state. In continuous systems, it is implemented by constrain- 
ing the diffusion process within the nodal pockets of the trial 
wave function. For long times, the distribution /(R, t) ap- 
proaches 5't(R) x I'fn(R) where 4 r pj\j(R) is the ground state 
wave function consistent with the boundary condition that it 
vanishes at the nodes of 'fx- The FN energy is an upper bound 
to the true fermionic ground state energy. 

When a non-local potential V NL is employed to remove the 
core electrons, the off-diagonal elements of the Hamiltonian 
in real space are generally non-zero and the standard DMC 
approach cannot be applied. If we analyze the behavior of the 
propagator at short time-steps: 



(R'| exp[-rH]|R) « <5 R ,, R - r<R'|W|R) 



(3) 



we note that, while the diagonal elements can always be made 
positive by choosing r small enough, the off-diagonal ele- 
ments are positive if and only if the off-diagonal elements of 
the Hamiltonian are non-positive. This condition is not al- 
ways met in the presence of non-local pseudopotentials, so the 
fermionic sign problem reappears even if one works in the FN 
approximation. Consequently, in addition to the FN approxi- 
mation, the LA is commonly introduced where the non-local 
potential V NL is replaced by a local quantity V LA obtained by 
"localizing" the potential on the trial wave function: 



V LA (R) = 



(RjyNLjjg 

<R|*t) 



(4) 



The DMC algorithm in the LA yields the FN ground state of 
the effective Hamiltonian 'H LA with the local potential V LA 
instead of the original non-local V NL . The fixed-node energy 
in the LA is equal to 
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LA _ <^nI^ LA Wn 
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(5) 
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and estimated by sampling the mixed distribution ^t^fn as 
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(6) 



Since "fp^ is the fixed-node ground state of 'H LA and not of 
the original Hamiltonian H, the mixed average energy of H is 
not equal to its expectation value on the wave function ^pN' 
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(7) 



- FN I FN 



Therefore, E^ A is in general not an upper bound to the ground 
state of T~L and the variational principle may not apply. 



A. Beyond the locality approximation 

The lattice regularized diffusion Monte Carlo (LRDMC) al- 
gorithm was recently developed to overcome this difficulty 6 
and then extended to the continuum formulation of DMC 7 . 
Both algorithms provide a variational scheme to treat non- 
local pseudopotentials in DMC by introducing an effective 
Hamiltonian, different from the one used in the LA approx- 
imation, which provides an upper bound to the ground state 
of the original Hamiltonian. We briefly describe here the al- 
gorithm in the framework of continuum DMC. 

We first apply a Trotter expansion for small time-steps to 
the importance sampling Green's function, 

G(R',R,t)« J dR"T NL (R',R",r)G loc (R",R,r), (8) 

where we have split the Hamiltonian into a local and a non- 
local operator. The propagator G loc (R' , R, r) is equal to the 
drift-diffusion-branching Green's function for the local com- 
ponent of the Hamiltonian, 
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= -[R'-R-tV(R)] 2 /2t tE£c(R') 



(9) 



where the velocity is defined as V(R) = V* t (R)/ v I't(R) 
and £l c (R) = 'H 1oc *t(R)/*t(R) is the local energy of 
the local part of the Hamiltonian (kinetic JC plus local poten- 
tial V loc ). The transition T NL contains the non-local potential, 



T (R', R, 



* T (R') 



*t(R) 
<5r\r - tVr' ji 



R'|exp[-rV 1NL ]|R; 



(10) 



where Vr %r = * t (R')/*t(R)(R'|V nl |R). In both the 
variational Monte Carlo and the standard DMC method with 
the LA approximation, one adopts a quadrature rule with a 
discrete mesh of points, belonging to a regular polyhedron 
used to evaluate the projection of the non-local component 
on a given trial wave function. 

Consequently, the number of elements Vr<r is finite and 
the transition T NL corresponds to the move of one electron on 
the grid obtained by considering the union of the quadrature 
points generated for each electron and pseudoatom (center of 
a non-local pseudopotential). Moreover, in order to work with 
a small quadrature mesh, the vertices of the polyhedron are de- 
fined in a frame rotated by and <fi, the azimuthal and planar 
angle respectively, which are taken randomly for each elec- 
tron. 

As discussed above, performing a transition based on T NL 
poses however a problem since T NL can be negative given 
that both * T (R')/* T (R) and (R'|V NL |R) can change sign. 
A solution is to apply the FN approximation not only to G loc 
but also to T NL by keeping only the transition elements which 
are positive: 



2£&(R',R,t) = <W-tVj 



R',R 



(ID 



where Vf£, R 



[Vk'.R ± I^R',r|]/2. The discarded ele- 
ments are included in the so-called sign-flip term, which is 
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then added to the diagonal local potential as 
V^(R) = V loc (R) + 5>+, R . 



(12) 
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(R|Heff|R> 
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(R|/C|R) + V>° ff c (R) 

(R'|V NL |R) if Vk',R< 0,(13) 



and yields the same local energy as the original Hamiltonian 
T~L. In contrast to the LA Hamiltonian, its ground state energy 
is an upper bound to the ground state energy of the true Hamil- 
tonian. Therefore, the variational principle is recovered and, 
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sition cures the instabilities which are commonly observed in 
a DMC run with the LA Hamiltonian and are due to the nega- 
tive divergences of the localized potential on the nodes of \&t- 
In the branching term of G loc (Eq. p\, the local potential 



V loc is replaced by V^ c (Eq.[12l and the weights of the walk- 
ers are multiplied by an additional factor which enters in the 
normalization of the transition Tfn and to order r is equal to: 

]TT F N N L (R',R) « exp (-tXV^r) . (14) 

R' V R' / 

The weights are therefore given by 



w 



WrfE E T ™ (R', R) = exp [-tE l (R)\ , (15) 



R' 



where E h (R) = W off * T (R)/*T(R) = W* t (R)/*t(R). 

The basic algorithm as proposed in Ref. 7 therefore consists 
of the following steps: 

1. The walker drifts and diffuses from R to R'. The move 
is followed by an accept/reject step as in standard DMC. 

2. The weight of the walker is multiplied by the branching 
factor exp [— t (-El(R') — Et)] where the trial energy 
Et has been introduced. 

3. The walker moves to R" according to the transition 
probability I*£(R", R')/ £ R/ „ T F N N L (R", R'). 

For large systems, the first step is implemented not by moving 
all the electron together but by sequentially drifting and dif- 
fusing each electron and applying the accept/reject step after 
each single-electron (SE) move. 



III. SIZE-CONSISTENCY 

In the move governed by the transition Tp^ 1 , only one elec- 
tron is displaced on the grid of the quadrature points gener- 
ated by considering all the pseudoatoms and all the electrons. 
Therefore, for given time-step, the probability of a successful 
move will increase with the system size (i.e. the number of 
electrons) and saturate to one for sufficiently large systems. 
In this limit, the effect of the move will become independent 
of the system size and lead to one electron being displaced at 
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The resulting effective Hamiltonian H c h is therefore given by g 
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1 : Acceptance of the Tp^ move as a function of time-step. The 



increasing dotted curves correspond to systems with 1, 2, 4, 8, 16 
and 32 oxygen atoms aligned at a distance of 30 A from each other. 
The dotted curves are obtained with the algorithm of Ref. 7 while the 
lowest continuum curve is obtained with the size-consistent DMC 
algorithm (version 1) we propose. We only show the size-consistent 
curve with 1 atom as it is indistinguishable from the ones obtained 
for the larger systems. 



each step. Therefore, for sufficiently large systems, the overall 
impact of the non-local move will decrease and the algorithm 
will effectively behaves more and more like in the LA proce- 
dure. 

To demonstrate the size-consistency problem of the algo- 
rithm as originally formulated in RefR we consider a series 
of systems consisting of an increasing number M of oxygen 
atoms aligned 30 A apart. The oxygen atom is described 
by an s-non-local energy-consistent Hartree-Fock pseudopo- 
tential 10 . The trial wave function is of the Jastrow-Slater 
type with a single determinant expressed on a cc-pVDZ ba- 
sis^and a Jastrow factor which includes electron-electron and 
electron-nucleus terms 11 . All Jastrow and orbital parameters 
are optimized in energy minimization 12 for a single atom and 
the wave function of a system with more than one oxygen is 
obtained by replicatingthe wave function of one atom on the 
other centers. In Fig. u\ we plot the acceptance of the Tp^ 
move as a function of time-step for systems containing 1, 2, 4, 
8, 16, and 32 oxygen atoms. For each system size, the proba- 
bility goes to zero at small time-steps and increases for larger 
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values of r as expected from the expression of Tp^ (Eq 
The acceptance increases with the size of the system; as a 
function of the time-step, it approaches its asymptotic value 
of one more quickly for the larger systems. 

To better understand the overall behavior of the algorithm 
with increasing system size, we also analyze the FN energy 
as a function of time-step. We are interested in comparing the 
results obtained with the conventional LA approach and with 
the algorithm of Ref. 7 described in the previous section. For 
a more meaningful and clear comparison with conventional 
DMC with the LA which employs a symmetrized branching 
factor, we modify the original algorithm as described in the 
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FIG. 2: Upper panel: DMC FN energy per atom for systems of M 
isolated oxygen atoms for r = 0.1. For given time-step, the results 
of the algorithm of Ref. 7 (red circles) approach those of the LA (blue 
triangles) upon increasing the system size, whereas the present al- 
gorithm (DMC version 1 , green squares) gives values independent of 
M. In the three algorithms, the branching factor is updated after hav- 
ing moved all the electrons (AE branching). Lower panel: time-step 
dependence of the DMC FN energy for the same three algorithms 
for M = 1 (filled symbols) and M = 32 (open symbols). The algo- 
rithm of Ref!- 7 -' has a problematic extrapolation to zero time-step for 
large enough system size. For M = 32, the linear extrapolation (red 
curve with open symbols) is consistent, as expected, with the corre- 
sponding result for M = 1 (red curve with filled circles). However 
a better % 2 would be obtained with a quadratic extrapolation, which 
in turn would require a sudden upturn at very small r to recover the 
correct zero time-step limit. 



previous section to also use a symmetrized branching factor: 



cxp[-r(£ L (R) + £ L (R'))/2] 



(16) 



where R and R' are the coordinates before the drift-diffusion 
move of the first electron and after the drift-diffusion move 
of the last electron, respectively, if the electrons are displaced 
subsequently. Such a simple modification is allowed as it only 
entails a different time-step error, which we actually find to be 
significantly smaller than the one obtained with the algorithm 
of RefPas we detail in the SectionM 



From the results reported in Fig. |2j we observe that the FN 
energies obtained with the algorithm of Ref. 7 significantly in- 
crease with M and approach the energies obtained with the 
LA algorithm. Already with a system with 32 oxygen atoms, 
the FN energies at r = 0.1 obtained with these two ap- 
proaches become equivalent within the error bars. The en- 
ergies given by the two algorithms must however extrapolate 
to different values as the time-step goes to zero 7 . The lower 
panel of Fig. [2] shows that for M — 32, in particular, they 
have to depart from each other within a tiny time-step inter- 
val near the origin. Because of this behavior, the algorithm of 
Ref Pis bound to have a problematic extrapolation to the zero 
time-step limit for large enough systems. 



A. Size-consistent formulations: version 1 



F71 

To address this problem, the original algorithm of Ref can 
be easily reformulated in a size-consistent manner by observ- 
ing that the transition T NL (Eq. 10 1 can be factorized as 



T (R'j R, r) 
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N 

n* NL (rU,r) : 



(17) 



where v NL is the non-local potential acting only on one elec- 
tron due to all the atomic centers so that the total non-local po- 
tential is given by the sum over the electrons, (R'| V NL |R) = 



We defined the matrix element v r > r as 



• r , r 



* T (ri • • 



4<r'|i/ NL |r, : 



(18) 



The transition i NL (r^, r^, t) displaces the i-th electron over 
the grid of quadrature points generated by considering only 
the i-th electron and all pseudoatoms which host the i-th elec- 
tron in their core region. 

The FN approximation is applied separately to each single- 
electron transition as 



t^(r',r,T) = 6 r , r -Tv; 



(19) 



where v^ r is defined in analogy to the case of the total non- 
local potential so that only the positive transition elements are 
kept in the transition matrix. In this formulation, the third 
step of the DMC algorithm detailed above consists of a loop 
over the electrons where each electron is subsequently moved 
according to the single-electron transition, tp^. Therefore, 
while in the original algorithm of Ref. 7 the configuration gen- 
erated in the Tpyf step differs from the starting configuration 
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only in the coordinate of one electron, the configuration re- 
sulting from this size-consistent move will generally change 
in more than one electronic coordinate and the number of elec- 
trons being moved will increase with the size of the system. 

To understand that the drift-diffusion-branching steps in the 
original algorithm do not need to be modified, we observe that 
the expression of the effective Hamiltonian H c g we are work- 



ing with is the same as in Eq. 13 in the limit of r going to 
zero. In particular, the sign-flip term obtained by summing 
all discarded terms vZ- over all the electrons is equal to the 



sign-flip term in V^ c (Eq. 
we have that, to order r, 



12 1 to zero-order in r. Similarly, 
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(20) 



and we recover the same branching factor as in the original 
algorithm. Therefore, both algorithms extrapolate to the same 
limit at zero time-steps. We will refer to this improved al- 
gorithm as "DMC version 1", to distinguish it from another 
size-consistent version we will define later in this section. We 
stress here that by "version 1" we do not only mean the use of 
the product of single-particle tp^ in step 3, but also the sym- 
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metrization of the weights in step 2, as described in Eq 
where the initial configuration is taken before the diffusion 
process (step 1), and the final is the one after step 1. 

The acceptance as a function of time-step using the size- 
consistent DMC algorithm (version 1) is shown in Fig.[TJ We 
only report the result obtained with M = 1 as the curves for 
the other system sizes are exactly equivalent within statistical 
error. This finding can be easily understood since the proba- 
bility of moving a given electron on the grid generated by con- 
sidering all centers will be practically the same as the proba- 
bility computed using only the atom close to the electron as 
all other centers are at least 30 A far apart. Therefore, in the 
new size-consistent algorithm, when more atoms are added 
to increase the size of the system, the loop over the electrons 
will ensure that each electron attempts a move around its clos- 
est center. The acceptance remains therefore constant as more 
oxygen atoms are added. In a more realistic systems (e.g. with 
closer oxygen atoms), there will be a weak dependence on the 
size of the system but, after enough atoms have been added 
for most electrons to experience an equivalent environment, 
the acceptance will become independent on the system size 
given the short-range nature of the non-local components of 
the pseudopotentials. 

The FN energies obtained for the oxygen systems with this 
size-consistent algorithm are compared in Fig. [2] with the re- 
sults of the LA and of the original algorithm. We observe 
that, as expected, the FN energies of the size-consistent algo- 
rithm extrapolate to the same value as the original algorithm 
as t goes to zero. On the other hand, while the size-consistent 
FN energies are close to the values obtained with the original 
method for the smallest, one-atom system, the FN results ob- 
tained by the two methods depart from each other as the sys- 
tem size increases. Importantly, the FN energies of the size- 
consistent scheme do not approach the LA results for large 
systems at finite r and their extrapolation to zero time-step is 



therefore as smooth for large as for small systems. 



B. Size-consistent formulations: version 2 

An alternative scheme to address the size-consistent prob- 
lem of the original algorithm of Re f. 3 can be obtained through 
a different route by starting from Eq.[l0| and breaking it up in 
N terms with time-step of t/N, such that: 



£t f n n l (R',R,t) = 



R' 



R,i ■ ■ ■ R \ i 



N 



NL/p o 
FN X^i ' 



i,r/N) 



(21) 

with = R', Ro = R, and the sum over the quadrature 
points sampled by the chain {Ro, ■ . . , Rj, • • • , Rjv} gener- 
ated during the random walk. This is another way to evaluate 
the quantity in Eq. [20] The difference is that Eq. [2T| involves 
a product of N all-electron factors, while Eq. |20|is a factor- 
ization of N single-electron terms. Both will avoid the satu- 
ration of the acceptance probability of the non-local Green's 
function Tp^, and therefore they will ensure a size-consistent 
time-step error. Since Eq. 21 requires the calculation of all 



matrix elements Vr/.r each time, it is more convenient to split 
the N factors in such a way that the diffusion move involving 
the i-th electron could be placed between the i — 1-th and z-th 
factor, and the corresponding branching weight updated as a 
product of subsequent single-electron components: 



N 

n 



exp 



N 



,rjv) 



(22) 



where we can exploit the knowledge of Vr/r to compute also 
El for every single-electron move. We will call this algorithm 
"DMC version 2"-21. It consists of the following steps: 

1. Diffusion move of the i— th electron. 

2. The weight of the walker is multiplied by the branching 
factor cxp [-^E L (r' 1 . . . r' i} r i+1 . . . , r N )] . 

3. The walker moves to R" accord- 
ing to the transition probability 
T F N N L (R", R, t/N)/ £ r ,„ T f n n l (R", R, t/N) 
which involves all the electrons. 

In contrast to the original algorithm, and the "DMC version 
1", these three steps need to be performed inside a loop over 
the electrons. In the "version 2" formulation of the DMC 
size-consistent algorithm, each electron drifts and diffuses in 
a time r and the branching factor is updated at each SE move 
with the total local energy El and time t/N in the exponent. 
After each SE branching update, a non-local transition is per- 
formed with r F ^ 1 (R',R, t/N), where one electron among 
all electrons is displaced over the grid of quadrature points 
obtained by considering all electrons and all pseudoatoms. 
Therefore, the electron displaced in the non-local move may 
differ from the electron which is currently being moved in the 
drift-diffusion step. 



6 



IV. LRDMC AND NON-LOCAL PSEUDOPOTENTIALS 

The main difference between the effective DMC Hamilto- 
nian reported in Eq. 



13 and the LRDMC one is the kinetic 



operator JC. In the LRDMC approach, JC is replaced by a dis- 
cretized Laplacian and treated on the same footing as V NL . In 
the original formulation, the discretized Laplacian is a linear 
combination of two discrete operators with incommensurate 
lattice spaces a and a', introduced to sample densely the con- 
tinuous space by performing discrete moves whose length is 
either a or a'. This method can be simplified by noticing that 
all the continuous space can be visited using only a single dis- 
placement length a, provided we randomize the direction of 
the Cartesian coordinates each time the electron positions are 
updated. The randomization of the lattice mesh is similar to 
the well established approach used to perform the angular in- 
tegration in the non-local part of the pseudopotential 4 . There- 
fore, in the LRDMC approach, we can extend the definition 
of the kinetic part by including both the discretized Laplacian 
and the non-local part of the pseudopotentials. The total non- 
local operator reads: 



N 



NL 



(23) 



where A"(0,-,<^) is the Laplacian acting on the z-th elec- 
tron and discretized to second order so that A"(#,,<^) = 
A, + 0(a 2 ). The discretized Laplacian is computed in a frame 
rotated by the angles and fa, which are chosen randomly 
and independently of the ones used to compute V . In this 
formulation, we need to evaluate only 6 off-diagonal elements 
of the Green's function instead of 12 as in the original algo- 
rithm, gaining a speed-up of a factor of 2 in full-core calcula- 
tions and of (12 + n quac i)/(6 + n quac i) with pseudopotentials, 
where n qua( j is the number of quadrature points per electron 14 . 
We notice that, with this simplification, the LRDMC error in 
the extrapolation to the continuous limit depends on a single 
parameter a, and the method can therefore be compared fairly 
with the DMC approach where the discretization of the diffu- 
sion process also depends on a single scale, i.e. the time-step 

T. 

In the LRDMC choice of the Hamiltonian, we further reg- 
ularize the single-particle operator v, defined as the electron- 
ion Coulomb interaction in full-core atoms or the local part 



of the pseudopotential, so that v (r-j) — > Vf (R) as 



Vf (R) = i/(rO 



(A, — Af)^r(R) 



(24) 



2* T (R) 

when acting on the i-th electron. The single-particle operator 
v acquires therefore a many -body term and Vf (R) depends 
on the all-electron configuration R. The total potential term 
is then given by 



N 



i=l 



V? + V ee + V„ 



(25) 



where no regularization is employed in the electron-electron 
V ee and ion-ion V nn Coulomb terms. This lattice regulariza- 
tion leads to an approximate Hamiltonian H a = JC a + V a 



which converges to the exact Hamiltonian as = % + 
a 2 AW. for a ->• 0, where we denote with a 2 AW the 0(a 2 ) 
LRDMC error onH. 

The lattice Green's function Monte Carlo algorithm can 
then be employed to sample exactly the lattice regularized 
Green's function, A — W a , and project the trial wave function 
i&T to the approximate ground state \jLRDMC wn j cn fulfills 
the fixed-node constraint based on ^y, in complete analogy 
to the DMC framework 6 . Note that, since the spectrum of H. a 
is not bounded from above, we need to take the limit A — >• oo, 
which can be handled with no loss of efficiency as described in 
Ref. [U The usual DMC Trotter breakup results in a time-step 
error, while the LRDMC formulation yields a lattice-space er- 
ror, but both approaches share the same upper bound property 
and converge to the same projected FN energy in the limit of 
zero time-step and lattice-space, respectively. 

Since the discretized Laplacian and the non-local poten- 
tial are treated on the same footing, and the sampling of the 
Green's function is based on a sequence of single-particle 
moves generated both from the Laplacian and the non-local 
part, the LRDMC is intrinsically size-consistent (in the sense 
previously discussed for the DMC algorithm), and no modifi- 
cation is necessary to make the lattice-space bias independent 
of the system size. It will depend however on the quality of 
the trial wave function in the way detailed below. 



A. Small a 2 correction for good trial function 



The regularization of the potential (Eq. 24 1 in the definition 



of the lattice Hamiltonian H a implies that the correction AH. 
satisfies: 



AU\^ T ) = 0. 



(26) 



Using this property, we can estimate the leading-order error 
of the lattice regularization by simple perturbation theory as 

E a = £° + a 2 (*LR. D MC| AH |^LRDM C) 

= E° + a 2 (^ RDMC - ^ T \AH\^ RDMC - * T ) 
= E° + 0(a 2 \^ RDMC -* T | 2 ), (27) 

where E a is the expectation value of the Hamiltonian T-L a on 
the approximate FN ground state v]/L RDMC an d E° the estrap- 
olated value as a — > 0. Thus, the approach to the continuous 
limit is particularly fast for good trial functions, namely for 
close to the ground state solution, since vpLRDMC j s a state 
with lower energy than ^>t and has to approach the ground 
state at least as &x does. The leading corrections to the con- 
tinuous limit are quadratic in the wave function error. This 
property is not easily generalized to the usual DMC method 
and, to our knowledge, has not been established so far. 



B. Well defined lattice regularization 

As in any lattice model, the Hamiltonian H a has a finite 
ground state energy only if the potential V a is always lim- 
ited from below. If V a (Ro) = — oo for some configuration 
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Ro, the variational state \&(R) = i5r.r will have unbounded 
negative energy expectation value and the ground state energy 
of % a is not defined. Unfortunately, the regularized potential 
V a (R) in Eq. (24 1 is not bounded from below when R be- 
longs to the (3N — 1) -dimensional nodal surface Af defined 
by the equation 4 r j-(R) = 0. To cure these divergences, we 
need to be able to establish when a configuration is close to 
the nodal surface. In the lattice regularized formulation, we 
can assign an electron position to the nodal surface, i.e. 
r, G Af a , if ^t(^z + afl) has the opposite sign of ^t^) for 
at least one of the six points used to evaluate the finite differ- 
ence Laplacian (i.e. fl is one of the six unit vectors ±x, ±y or 
±z of the reference frame randomly oriented according to the 
angles 0i and fa). Af a correctly defines the nodal surface Af 
in the limit a — >• 0. 

With this definition of nodal surface, we can modify Vf so 
that it remains finite when G Af a : 



Vf(R) = 



Max[i/( ri ),V?(R)] if^GA^ 



Vf(R) 



otherwise 



(28) 



If r; ^ Af a , we use the original LRDMC definition of Vf 
since Vf remains finite even when an electron approaches a 
nucleus for trial functions which satisfy the electron-ion cusp 
conditions. If G Af a , we need to distinguish two cases. If 
the electron is not close to a nucleus, the regularized Vf can 
diverge negatively while v(ri) remains finite and, according 
to Eq. 



28 



the potential Vf coincides with v. If the electron is 
close to a nucleus in a full-core calculation, both Vf and v(v,f) 
diverge, so we need to further regularize v(ri) in the right 



hand side of Eq. ( 28 i and use an expression bounded from 
below. In this particular case, we choose to replace the diver- 
gent electron-ion contribution —Z/\r in \ in v{rf) with —Z/a 
whenever the electron-ion distance |r-j n | < a. 

If we employ the regularized potential V a in the Hamilto- 
nian H a , we no longer satisfy Eq. (26i and, in principle, it 



is not possible to compute E a by averaging the local energy 
H^t/^t- However, the use of V a introduces only negligible 
errors in the computation of E a because the regularization is 
adopted only in a region of volume S x a, where S is the area 
of the nodal surface Af. Since both the trial and the LRDMC 
wave function vanish ~ a close to the nodal surface, the finite 
lattice error corresponds to averaging (W a — H)^t/^t(oc a) 
over "fx^FAf fl2 ) m a n °dal region of extension oc a. If we 
collect these contributions, we find that the present regulariza- 
tion introduces a bias in the nodal region which vanishes as a 4 
for a — » and is always negligible compared to the dominant 
contribution O(a 2 \'^o — ^t\ 2 )- Moreover, since the regu- 



larization in Eq. ( |28| > acts independently on each electron, it 
does not affect the size-consistent character of the algorithm, 
and the energy of N independent atoms at large distances is 
equal to N times the energy of a single atom. Therefore, we 
did not perform any LRDMC calculations for the oxygen sys- 
tems since the energy per atom as a function of a is exactly 
independent of N. 



V. PERFORMANCE OF THE PROPOSED METHODS 

An important point to address is the efficiency of our re- 
vised techniques. This involves not only the computational 
cost per Monte Carlo step, but also the elimination of the dis- 
cretization error (in time or space, as appropriate) by extrapo- 
lation to the continuum limit. Indeed, a smaller and smoother 
bias enhances the overall efficiency, as does the knowledge of 
the leading term in the discretization parameter. 



A. Time-step error 

We study the time-step error on the FN energy computed 
with the various algorithms discussed above, using the Oxi- 
rane molecule (C2H4O) as a test case. Our aim here is in 
particular to assess the reduction of the time-step error with 
respect to the original algorithm 7 . In the DMC "version 1" 
this reduction is due to the symmetrization of the weights, 
while in the DMC "version 2" it is due to the update of the 
branching factor after single-electron moves. We employ non- 
local energy-consistent Hartree-Fock pseudopotentials 1 ^ for 
the oxygen and the carbon atoms in combination with the 
corresponding cc-pVDZ basis sets, and construct two single- 
determinant Jastrow-Slater wave functions of different qual- 
ity. The first wave function is built from B3LYP orbitals 
and a very simple electron-electron Jastrow factor of the form 
b[l — exp(— Krij)]/n, where 6 = 1/2 or 1/4 for antiparallel- 
and parallel-spin electrons, respectively. The parameter k is 
optimized in energy minimization and is equal to 1.91. The 
second wave function is characterized by a more sophisticated 
Jastrow factor comprising of electron-electron, and electron- 
nucleus terms, and all orbital and Jastrow parameters in the 
wave function are optimized in energy minimization. 

The top panel of Fig. [3] shows results obtained with the 
simple wave function. Consistently with previous studies on 
the water molecule^, the LA energies extrapolate to a lower 
value (not necessarily variational) than the original algorithm 
of Ref. 7 , with a smaller time-step error; symmetrization of the 
branching factor in the original algorithm is already sufficient 
to reduce the time-step error down to a value similar to that 
found in the LA. As expected, given the small size of the sys- 
tem considered, the original and the size-consistent algorithm 
give nearly identical results, as shown here for its version 1 
with AE branching. 

The main result shown in the top panel of Fig. [3] is the re- 
markable reduction of the time-step error obtained with a SE 
branching factor. The data shown in the Figure refer to version 
2 of the size-consistent algorithm. We also mention, without 
reporting the data, that when the branching factor is updated 
after SE moves, the symmetrization of the local energy in the 
exponent does not improve the time-step error significantly. 

The improvement obtained with a SE branching factor, 
however, is strongly dependent on the quality of the trial func- 
tion. The lower panel of Fig. [3] shows results obtained with 
the more sophisticated wave function. We still find a lower 
energy with the LA (with its possibly problematic behavior 
at very small time-step), and a large time-step error with the 
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FIG. 3: FN energies as a function of time-step for the Oxirane 
(C2H4O) molecule, obtained using a simple (top) and a more so- 
phisticated (bottom) trial wave function. We employ different 
schemes, i.e. the original algorithm as in Ref. 7 and with an improved 
symmetrized branching factor ("sym"), the two size-consistent ap- 
proaches we proposed ("DMC version 1", and "DMC version 2"), 
the LA approach, and the LRDMC method. The lattice-space has 
been mapped into the time-step via the relation r = 0.6 a 2 , which 
guarantees the same autocorrelation time between the "DMC version 
1" and the "LRDMC" method in this particular case. 



original algorithm of RefH All the other cases, however, dis- 
play similar behavior, or at least comparable quality, in terms 
of the time-step error. 

Also the LRDMC energy values are reported in Fig. [3] 
where the lattice-space has been converted into time-step 
based on the equal auto-correlation time between Monte Carlo 
generations in the DMC and LRDMC algorithms. This is 
the fairest mapping since it keeps the final statistical error 
equivalent for the same sample length. In this case, it gives 
t ~ 0.6 a 2 . One can see that the LRDMC energies are always 
converging from below in a monotonic way, usually easier to 
extrapolate than the corresponding DMC energies. 

In order to make a more quantitative analysis of the pre- 



dictions reported in Section IV for the lattice-space error, 
we studied the lattice-space extrapolation of the Oxirane 
molecule with the DFT-B3LYP Slater determinant, and Jas- 
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FIG. 4: FN LRDMC energies as a function of lattice-space a for 
the Oxirane (C2H4O) molecule, obtained using three types of las- 
trow factors. The fitting curves include a quadratic and quartic term, 
namely f(a) = Eo + ba 2 + ca 4 . The finite lattice space error im- 
proves dramatically with a better wavefunction. For the simple 2- 
body Jastrow, b « —0.15, while for the most accurate Jastrow factor 
b' ~ —0.03. The ratio of the variances of the two trial wave func- 



tions is roughly equal to 6/6', in agreement with Eq. 1 27 



trow factors going from the simple 2-body one, to the most 
complicated comprising of one-, two-, and three-body terms. 
The results are reported in Fig. [4] For good trial wave func- 
tions, a reliable extrapolation can be obtained even by using 
very large values of a, where small statistical errors can be ob- 
tained with much less computational effort. Also, the FN en- 
ergies are basically independent of the shape of the trial wave 
function already for a rather simple Jastrow with 1-body and 
2-body terms, implying that the "locality error" becomes neg- 
ligible in the variational formulation even for not-so-accurate 
trial wave functions. This consideration applies also to the 
DMC variational energies, since the zero-lattice-space zero- 
time-step limits are equivalent. 



B. Relative efficiency 

In all the methods presented here, there is an extra compu- 
tational cost per Monte Carlo step with respect to the stan- 
dard DMC with LA since an extra step is needed in order 
to sample correctly the Green's function related to the non- 
local pseudopotentials. However, we have seen that, in all 
the variational methods, the non-local pseudopotential opera- 
tor will displace only one electron a time since the non-local 
pseudopotential gives a one-body contribution to the Hamil- 
tonian. This means that, in order to update all the quan- 
tities needed by the simulation as the wave function ratios, 
Vr/ r, the gradients and Laplacian terms, one can exploit the 
Sherman-Morrison algebra, which scales as N 2 . For instance, 
to update the non-local term Vr/.r, as well as the JC a in the 
LRDMC, one employs the same algebra as the one used to 
update the gradient (i.e. the drift term) in the standard DMC 
with importance sampling. After a single particle move, the 
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cost to fully update Vr/.r scales as n g N 2 where n g- is the 
number of non-local mesh points per electron (n Q ff = n qua( j in 
DMC with non-local pseudopotentials and n Q ff = n quac j + 6 in 
LRDMC with non-local pseudopotentials and a single lattice- 
space in the Laplacian). In the size-consistent DMC (both 
"version 1" and "version 2"), the pseudopotential move has to 
be performed N times in a single time-step r, so the overall 
cost per time-step coming from the pseudopotential operator 
is r] n Q ff TV 3 , where rj is the acceptance ratio of the non-local 
part. Since n Q ff w 20, and r\ ~ 0.1 at convergence (see Fig.[T]i, 
it is clear that the DMC "version 1" will be only a pref actor 
R3 2 slower than the standard DMC with LA. The "version 2" 
might be slightly slower than the "version 1" since it requires 
the calculation of the local energy after each single-particle 
move, but again the difference will be just a prefactor. The 
LRDMC approach is the slowest because the total number of 
operations in a cycle with N single-electron updates of the 
local energy takes (10 + n qua( j)/(4 + n qua( j) < 2.5 more 
operations (the worst case is for full-core calculations when 
n qU ad = 0). Moreover, there is also an additional slowing 
down compared to the DMC "version 1" approach because, 
in the latter case, all operations involving the local energy can 
be done at the end of a cycle and cast in a very efficient form 
using matrix-matrix multiplications of size ~ N. These oper- 
ations, for large N >~ 1000, can be much more efficient than 
single-electron matrix updates (by a factor ranging from 2 to 
20, depending on the computer hardware and software). At 
present, it is difficult to estimate how much slower LRDMC 
will be on a particular machine 17 , also considering that fur- 
ther algorithmic and software developments are expected in 
the near future, which should allow faster updates. However, 
even though LRDMC is certainly slower, it has the advantage 
of a much smoother lattice-space extrapolation as discussed 
above. 



VI. CONCLUSIONS 

In conclusion, we have introduced important developments 
in the DMC and LRDMC methods in the context of electronic 
structure simulations with non-local pseudopotentials. 

We have explained how to modify the DMC variational for- 
mulation for non-local potentials of Ref. 7 in order to make 
it size-consistent. We have shown that, for large systems, 
the original algorithm 7 will depart from the usual localiza- 
tion approximation only for small time-steps, making the zero 
time-step extrapolation possibly problematic. Instead, the 
two DMC algorithms presented here, based on a more accu- 
rate Trotter break-up for the non-local operator and a better 
branching factor, have a smaller and size-consistent time-step 



error. The DMC version 1, which features a single-particle 
representation of the non-local operator and a branching factor 
symmetric with respect to the application of the diffusion op- 
erator, is straightforward to implement in the existing codes. 
The DMC version 2 is closer to the LRDMC spirit, since the 
non-local part is further split in t/N factor always acting on 
the all-electron configuration, and the branching factor is ac- 
cumulated after every single-particle move. The latter ver- 
sion can give an even better time-step error (order 0(t/N) in 
the non-local part), particularly for relatively poor wave func- 
tions. In general, it is slightly more time consuming than the 
version 1, since it requires the evaluation of the full non local 
matrix after every single-particle move. 

We have made significant progress also in the LRDMC ap- 
proach. In the present formulation, it is no longer necessary 
to use two lattice meshes to randomize the electron position, 
but a single lattice space a is sufficient, provided the orienta- 
tion of the Cartesian coordinates of the discretized Laplacian 
is changed randomly during the diffusion process. We have 
defined a better lattice regularization of the Hamiltonian in 
order to have always a potential bounded from below, with 
a cutoff depending on a. This leads to a well defined and 
size-consistent lattice-space extrapolation since, in the a — > 
limit, we recover the variational expectation value of the con- 
tinuous Hamiltonian with a lattice space error whose leading 
term is quadratic in a. Moreover, we showed that the prefactor 
of the a 2 term vanishes quadratically in | — | ■ Therefore, 
for good wave functions, the extrapolation to the a — > limit 
is particularly rapid and smooth with a computational effort 
cx I /a 2 . The DMC error appears instead to be less corre- 
lated to the quality of the guiding function and may display 
a turn-down behavior for small time-steps (observed here and 
elsewhere^), which makes the time-step extrapolation much 
harder than in the LRDMC lattice-space approach. Regard- 
ing the computational cost, the LRDMC approach is slower 
but the overall efficiency is comparable to the two variational 
and size-consistent DMC formulations presented here since 
LRDMC allows one to work with large values of a due to the 
robust extrapolation to the zero lattice-space limit. 



Acknowledgments 

CF acknowledges the support from the Stichting Nationale 
Computerfaciliteiten (NCF-NWO) for the use of the SARA 
supercomputer facilities. MC thanks the computational sup- 
port provided by the NCSA of the University of Illinois at 
Urbana-Champaign. SS and SM acknowledge support from 
CINECA and COFIN'07. 



1 W. M. C. Foulkes, L. Mitas, R. J. Needs, and G. Rajagopal, Rev. 
Mod. Phys. 73, 33 (2001). 

2 D. M. Ceperley, J. Stat. Phys. 43, 815 (1986). 

3 B. L. Hammond, P. J. Reynolds, W. A. Lester, Jr., J. Chem. Phys. 



87, 1130(1987). 

4 S. Fahy, X. W. Wang and Steven G. Louie, Phys. Rev. B 42, 3503 
(1990). 

5 L. Mitas, E. L. Shirley and D. M. Ceperley, J. Chem. Phys. 95, 



10 



3467 (1991). 

6 M. Casula, C. Filippi, and S. Sorella, Phys. Rev. Lett. 95, 100201 
(2005). 

7 M. Casula, Phys. Rev. B 74, 161102(R) (2006). 

8 S. Sorella and L. Capriotti, Phys. Rev. B 61, 2599 (2000). 

9 D. F. B. ten Haaf, H. J. M. van Bemmel, J. M. J. van Leeuwen, W. 
van Saarloos, and D. M. Ceperley, Phys. Rev. B 51, 13039 (1995). 

10 M. Burkatzki, C. Filippi, and M. Dolg, J. Chem. Phys. 126, 
234105 (2007). 

" C. Filippi and C. J. Umrigar, J. Chem. Phys. 105, 213 (1996). As 
Jastrow correlation factor, we use the exponential of the sum of 
two fifth-order polynomials of the electron-nuclear (e-n) and the 
electron-electron (e-e) distances, respectively. The Jastrow factor 
is adapted to deal with pseudo-atoms, and the scaling factor k is 
set to 0.8 a.u. for the oxygen systems and 0.5 a.u. for oxirane. 

12 C. J. Umrigar, J. Toulouse, C. Filippi, S. Sorella, and R. G. Pfen- 
ning, Phys. Rev. Lett. 98, 110201 (2007). 

13 The short- time propagator of DMC version 2 lends itself to an 



implementation of the Reptation quantum Monte Carlo method 
[S. Baroni and S. Moroni, Phys. Rev. Lett. 82, 4745 (1999)] with 
single-particle moves. 

For heavy elements treated either as full-core atoms or with small- 
core pseudopotentails, the two-mesh discretization might remain 
a better choice since it allows a more efficient sampling of the 
different length scales in the core and valence regions. 
I.G. Gurtubay and R.J. Needs, J. Chem. Phys. 127, 124306 (2007). 
M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. 
Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, 
S. Su, T. L. Windus, M. Dupuis, J. A. Montgomery, J. Comput. 
Chem. 14, 1347 (1993). 

The relative efficiency between LRDMC and "DMC version 1" 
within the same machine (Intel quad-core) and the same code^is 
less than a factor 2 for the cases shown in this work. 
S. Sorella, https://qe-forge.org/projects/turborvb/. 



